-
Notifications
You must be signed in to change notification settings - Fork 1.4k
[ntuple] Attributes: update specs and add de/serialization #19894
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
base: master
Are you sure you want to change the base?
Conversation
Test Results 21 files 21 suites 3d 21h 11m 14s ⏱️ For more details on these failures, see this check. Results for commit 024401a. ♻️ This comment has been updated with latest results. |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
In principle looks good to me. Some comments.
b0449c0
to
48b72ff
Compare
I updated the PR and uniformed the |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Very nice! Some minor comments.
Since this is changing the binary format, let's perhaps get a second approval.
48b72ff
to
07b8b20
Compare
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
some comments on the spec additions and the iterator implementations
07b8b20
to
6c3a26f
Compare
1. it cannot have linked Attribute RNTuples itself; | ||
2. the Alias columns sections, both in its header and footer, must be empty (i.e. none of the Attribute Set RNTuple's | ||
Fields can be Projected Fields); | ||
3. none of its fields may have a structural role of 0x04 (i.e. it must not contain a ROOT streamer object); |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Is it a choice or a technical limitation?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
It's a choice (see other answer)
An Attribute Set RNTuple has a number of restrictions compared to a regular RNTuple: | ||
|
||
1. it cannot have linked Attribute RNTuples itself; | ||
2. the Alias columns sections, both in its header and footer, must be empty (i.e. none of the Attribute Set RNTuple's |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Is there (somewhere else) a more detailed explanation/reason for this limitation?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
We decided that we want to keep the Model of the attribute sets "as simple as possible" by removing certain advanced functionalities that we don't see as useful in the context of metadata. We may relax these restrictions in the future should the need arise.
|
||
### Attribute Schema Version | ||
Each Attribute Set is created with a user-defined Model. This Model is not used directly by the underlying Attribute | ||
Set RNTuple, but it is augmented with internal fields used to store additional data that serve to associate each |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
When inspecting the Attribute Set RNTuple
will the additional fields be exposed or hidden? (And If they are exposed would a user be able to distinguish the implicit vs the explicit part of the model?)
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
They will be hidden: the user doesn't have access to the inner model but only to the fields they defined on the user model.
if (fnBufSizeLeft() < static_cast<int>(sizeof(std::uint64_t))) | ||
return R__FAIL("record frame too short"); | ||
std::uint16_t vMajor, vMinor; | ||
bytes += DeserializeUInt16(bytes, vMajor); |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Should there be a:
if (vMajor > 1)
throw;
and/or did I misunderstand:
A change in Major version number indicates a breaking, non-forward-compatible change in the schema: readers should
refuse reading an Attribute Set whose Major Schema Version is unknown.
and/or should that statement be weakened?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Yes, we should add that check, thanks.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Actually I think I'll add it in a later PR when I define a constant with the current major version
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
LGTM in principle! Just some additional suggestions to the spec to improve readability and consistency.
|
||
## Linked Attribute Sets | ||
|
||
An RNTuple may have zero or more linked Attribute Sets, containing metadata. |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
An RNTuple may have zero or more linked Attribute Sets, containing metadata. | |
An RNTuple may have zero or more linked Attribute Sets, containing user-defined metadata. |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I don't think this is necessarily true, as some metadata may be automatically defined (e.g. ROOT's internal attributes). It is anyway not relevant for the purposes of specification
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I think it is relevant to distinguish between the type of metadata the attributes store and the other kind of metadata (i.e. the ones in the header and footer) that is mentioned throughout this spec. It's true that "user-defined" may not be complete enough, how about something like "entry metadata", "metadata over (ranges of) entries" or "metadata related to the stored contents" instead?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I see; I'll think of an appropriate phrasing.
6c3a26f
to
024401a
Compare
An attribute set record frame has the following contents: | ||
``` | ||
0 1 2 3 | ||
0 1 2 3 4 5 6 7 8 9 0 1 2 3 4 5 6 7 8 9 0 1 2 3 4 5 6 7 8 9 0 1 | ||
+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+ | ||
| Schema Version Major | Schema Version Minor | | ||
+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+ | ||
| Attribute Anchor Uncompressed Size | | ||
+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+ | ||
``` | ||
|
||
- The first 32 bits contain the _Attribute Schema Version_. This is split into a _Major_ (16 LSB) and a | ||
_Minor_ (16 MSB) version. The Schema Version is described below; | ||
- a 32-bit unsigned integer follows, containing the uncompressed size of the Attribute Anchor. | ||
|
||
These fields are followed by: | ||
|
||
- a locator pointing to the Attribute RNTuple's anchor; | ||
- a string containing the Attribute Set's name. All linked Attribute Sets must have a non-empty, distinct name. |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Sorry for not being clear: I think this part must stay in the "Footer Envelope" section. The remaining restrictions and explanations should stay in this new section, thanks.
First PR of a series to merge the RNTuple Attributes into master. The final result will be this, although the commits will be reorganized to be more coherent and reviewable.
This first PR updates the binary format specification (introducing a new minor version) and updates the Serializer and Descriptor code to match. This is backward-compatible and no Attribute can be written yet since the writer API will be introduced later.
Checklist: